Efficient Diskless Checkpointing and Log Based Recovery Schemes
نویسندگان
چکیده
Checkpointing and message logging are the popular and generalpurpose tools for providing fault tolerance in distributed systems. Diskless checkpointing schemes enable frequent checkpointing without a performance penalty. The present work extends James S Plank‟s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ mechanism to checkpoint programs with high locality of reference. This mechanism enables applications with high locality of reference to take checkpoints periodically. The limitation of N+1 Parity scheme is that all the processes freeze their respective computation, while taking synchronous checkpoints. The proposed scheme solves this problem by introducing a new message logging technique namely partial message logging which allows asynchronous checkpointing at both sender and receiver. Correctness of the scheme is established through a set of proofs. This paper includes the performance evaluation of proposed scheme by making use of distributed simulator test-bed. The results indicate that proposed scheme outperforms N+1 Parity Scheme.
منابع مشابه
Using two-level stable storage for efficient checkpointing - Software, IEE Proceedings- [see also Software Engineering, IEE Proceedings]
Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, checkpoint data is saved on disk, however, in some situations the time to write the data to disk can represent a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper starts by presenting two main memory ch...
متن کاملEnhanced Two-level Fault Recovery Scheme Combined with Message Logging
⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...
متن کاملAdaptive Checkpointing
Checkpointing is a typical approach to tolerate failures in today’s supercomputing clusters and computational grids. Checkpoint data can be saved either in central stable storage, or in processor memory (as in diskless checkpointing), or local disk space (replacing memory with local disk in diskless checkpointing). But where to save the checkpoint data has a great impact on the performance of a...
متن کاملDiskless Checkpointing Diskless Checkpointing
The precursor to this work (where diskless checkpointing was rst described) was presented at FTCS-24 27]. Abstract Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motiva...
متن کاملAccommodating Logical Logging under Fuzzy Checkpointing in Main Memory Databases
This paper presents a simple and effective method to reduce the size of log data for recovery in main memory databases. Fuzzy checkpointing is known to be very efficient in main memory databases due to asynchronous backup activities. By this feature, most recovery works in the past have used only physical logging schemes. Since the size of physical log records is quite large, physical logging s...
متن کامل